Goto

Collaborating Authors

 triangle inequality


AGeneralized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Neural Information Processing Systems

The bisimulation metric (BSM) is a powerful tool for computing state similarities within a Markov decision process (MDP), revealing that states closer in BSM have more similar optimal value functions. While BSM has been successfully utilized in reinforcement learning (RL) for tasks like state representation learning and policy exploration, its application to multiple-MDP scenarios, such as policy transfer, remains challenging. Prior work has attempted to generalize BSM to pairs of MDPs, but a lack of rigorous analysis of its mathematical properties has limited further theoretical progress. In this work, we formally establish a generalized bisimulation metric (GBSM) between pairs of MDPs, which is rigorously proven with the three fundamental properties: GBSM symmetry, inter-MDP triangle inequality, and the distance bound on identical state spaces. Leveraging these properties, we theoretically analyse policy transfer, state aggregation, and sampling-based estimation in MDPs, obtaining explicit bounds that are strictly tighter than those derived from the standard BSM. Additionally, GBSM provides a closed-form sample complexity for estimation, improving upon existing asymptotic results based on BSM. Numerical results validate our theoretical findings and demonstrate the effectiveness of GBSM in multi-MDP scenarios.


Tight Bounds On The Distortion of Randomized and Deterministic Distributed Voting

Neural Information Processing Systems

We study metric distortion in distributed voting, where nvoters are partitioned into k groups, each selecting a local representative, and a final winner is chosen from these representatives (or from the entire set of candidates). This setting models systems like U.S. presidential elections, where state-level decisions determine the national outcome. We focus on four cost objectives from Anshelevich et al. [1]: avg-avg, avg-max, max-avg, and max-max. We present improved distortion bounds for both deterministic and randomized mechanisms, offering a near-complete characterization of distortion in this model. For deterministic mechanisms, we reduce the upper bound for avg-max from 11 to 7, establish a tight lower bound of 5 for max-avg (improving on 2+ 5), and tighten the upper bound for max-max from 5 to 3. For randomized mechanisms, we consider two settings: (i) only the second stage is randomized, and (ii) both stages may be randomized. In case (i), we prove tight bounds: 5 2/k for avg-avg, 3for avg-max and max-max, and 5for max-avg. In case (ii), we show tight bounds of 3 for max-avg and max-max, and nearly tight bounds for avg-avg and avg-max within [3 2/n, 3 2/(kn)]and [3 2/n, 3], respectively, where n denotes the largest group size.


Efficient k-Sparse Band-Limited Interpolation with Improved Approximation Ratio

Neural Information Processing Systems

We consider the task of interpolating a k-sparse band-limited signal from a small collection of noisy time-domain samples. Exploiting a new analytic framework for hierarchical frequency decomposition that performs systematic noise cancellation, we give the first polynomial-time algorithm with a provable (3+ 2+ฮต)approximation guarantee for continuous interpolation. Our method breaks the long-standing C > 100 barrier set by the best previous algorithms, sharply reducing the gap to optimal recovery and establishing a new state of the art for high-accuracy band-limited interpolation. We also give a refined "shrinking-range" variant that achieves a ( 2+ฮต+c)-approximation on any sub-interval (1 c)T for some c (0,1), which gives even higher interpolation accuracy.


Implicit Regularization in Perturbed Deep Matrix Factorization: Spectral Conditions and Stability

arXiv.org Machine Learning

This paper studies the stability of low-rank implicit regularization in perturbed deep matrix factorization, where the target matrix is corrupted by a noise matrix. We first derive sufficient spectral conditions under which gradient descent exhibits a low-rank phase in the noiseless setting. These conditions show how the target spectrum, initialization, and step size jointly determine the existence of a nonempty low-rank interval. We then analyze the perturbed gradient descent dynamics, proving convergence guarantees and quantifying how the perturbation affects iteration complexity and eigenvalue recovery. Finally, we show that the low-rank phase persists under perturbation, with explicit dependence on the perturbation size. Numerical experiments support the theoretical findings.


Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

arXiv.org Machine Learning

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ฮต$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $ฮฑ$-Hรถlder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ฮต$ error in $ฮต^{-\max(2,\frac{d+1+o(1)}{1+ฮฑ})}$ time for $0 < ฮฑ< 1$ Hรถlder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $ฮ˜(ฮต^{-2})$ for $ฮฑ> 1/2$ when $d=2$ and nearly optimal as $ฮฑ\to 1$ when $d = 3$.


Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points

arXiv.org Machine Learning

A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity preserves edge probabilities, it can distort geometry and invalidate distance based temporal comparisons at both the trajectory and node-levels. We develop Multiscale Euclidean Network Trajectories (MENT), a framework for multiscale temporal trajectories based on second-moment geometry. By imposing an isotropic normalization on the anchor latent positions, we reduce the relevant ambiguity to orthogonal transformations and prevent distortion of the second-moment geometry. In this canonical representation, we define a trace variation distance and mode-wise variation distances along orthogonal directions, and use multidimensional scaling to obtain low-dimensional trajectories of time points at both global and mode-wise levels. The resulting trajectories support interpretation and inference. They admit mode-wise decompositions, support attribution of global and mode-wise temporal changes to nodes, and enable change point detection through 1D trajectories. We prove consistency of the proposed unfolded spectral embedding and of the induced temporal trajectories. Experiments on two synthetic and two real dynamic networks illustrate stable and interpretable recovery of temporal structure and show strong performance against existing change point detection baselines.


Adaptive Estimation and Inference in Semi-parametric Heterogeneous Clustered Multitask Learning via Neyman Orthogonality

arXiv.org Machine Learning

We study clustered multitask learning in a semiparametric setting where tasks share a latent cluster structure in their target parameters but exhibit heterogeneous, potentially infinite-dimensional nuisance components. Such heterogeneity poses a major challenge for existing multitask learning methods, which typically rely on aligned feature spaces or homogeneous task structures. To address this challenge, we propose an adaptive fused orthogonal estimator that integrates Neyman-orthogonal losses with data-driven pairwise fusion penalties. Our framework leverages task-specific pilot estimates to calibrate the fusion penalties and combines adaptive aggregation with orthogonalization to mitigate the impact of nuisance-parameter estimation error. Theoretically, we show that the proposed estimator achieves exact recovery of the latent clustering with high probability and attains pooled parametric convergence rates proportional to cluster size. Moreover, we establish asymptotic normality and show that, asymptotically, our estimator matches the performance of an oracle procedure that knows the true clustering in advance. Empirically, we show that the proposed method consistently outperforms strong baselines in various simulation setups. A real-world application to U.S. residential energy consumption demonstrates the effectiveness of our approach in uncovering meaningful regional clustering in electricity price elasticity, showcasing the efficacy of our method.